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ABSTRACT 



This report presents the results of analysis of potential 
outliers ana missing aata points in S-D data. Treatments of 
isolated and multiple questionable observations (potential outliers 
and/or missing aata points) are suggested for inclusion in the 
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methoc tor fitting polynomials of oraer tnree or less. 
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I . INTRODUCTION 

The purpose ot tnis report is to present the results of a 
stuay ot methods ot treatment ot potential outliers (wild data 
values) and missing points for inclusion in an algorithm tor 
smoothing of data at NUWbs. Potential outliers and missing data 
points can contaminate Doth data smoothing (Ref. 1) and geometric 
analysis or vehicular paths (Ref. 2). 

Data usee in this investigation were oDtained for a single 
trial run at NUWES . (This run was labeled Trial =2 by this 
investigator.) Two vehicles (A and Bi were involved in tnis 
trial. Plots of the horizontal and vertical paths of tne two 
vehicles are shown in Figures la,b. Missing points are circled m 
Figure lb and denoted by M in data lists. Potential outliers 
are boxed in rigures and denoted by W in data lists. 

Data at every eighth scneculea data collection time is 
miss in.,. In addition, there are other missing data times. 

Temporary values tor these were established as the average of tne 
adjacent values (Kef. 1). Potential Outliers are identities oy 
the use of sequential differences (Ref. 1) with any fourtn order 
ditterence (A4) having a magnitude of 5U or greater being 
considered a potential outlier. (The selection of tne threshold 
ot 50 is somewhat arbitrary as discussed in Reference 3.) 

Data smoothing in this study, and proposed for inclusion 
in data processing at N'JWES , uses the 7-point Least-Squares 
Polynomial Regression designed for 7 consecutive observations 
with no missing data (Ref. 1). 



1 



out 
T ne 



t\ general discussion of the magnitude of the 
lier and missing point problem is presented in 
ir treatment is discussed in Section III. 
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GENERAL 



The magnitude of the problem of potential outliers and 
missing points can be demonstrated by the frequency or tneir 
occurrence in Trial 2. Observational times tor vehicle A were 
xrom t = 2U84 to t = 2374 and incluaec 246 oDservat lona 1 times. 
These included 2b scneaulea missing times (M) and 7 additional 
unscneauled ones for a total of 48 missing data times. There 
were also 43 potential outliers (to) in this patn with 6 or tnem 
designated as botn to and M. A summarv of the occurrences of wild 
anc missing data is shown below. 
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rt similar examination or tne path or tne venicie n was 
also mace. ODservat lona 1 times were from t = z’Jby to t - -358 
giving 28r> oose rva t lona 1 times. Tnese included 35 scheduled 
missing times and 48 unscheduled ones tor a total of 25. There 
were 22 potential outliers in this data none of wnich were also 
missing data values. This is summarized in Table lb. 
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only a Drief examination of the extent: of wild ana missing 
aat a was made. Their causes are certainly or concern to data 
collection personnel but procedures tor treatment are of concern 
tor data processing. Some general comments are presented: 

. 1) There are about seven times as many unscheduled missing cat a 
times for tne path of vehicle B as there are for that or 
venicle A . A cursory examination suggests that these are 
more prevalent m tne path of vehicle B immediately follow- 
ing its approach oy vehicle a . Following two or the three 
approaches m tnis trial, vehicle A was closer to tne near- 
est tracking array than vehicle a . (between them? This may 
be or interest to data collection. These segments ot the 
venicuiar patns mav oe or lesser concern for data processing 
Decause they may be of lesser interest to tne personnel wno 
are tne users of tne smootnec data.) 

(2» Tnere are about twice as many potential outliers in tne 
path of vehicle A as in that of venicle b . Tnat tneir 
frequency is greater is not unexpected since the vehicle 
b was Going less maneuvering (ostensibly, cn a straight line 
path). That H ot the missing values in the path of vehicle 
A are also designated as potential outliers should not be 
unexpected. The temporary value inserted for missing values 
using linear interpolation between adjacent values can be 
expected to be inconsistent when tne actual path is not 
linear. Note that none of the missing values in the path of 
vehicle A were also designated as potential outliers. 
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(3) It is interesting to note the low rate ot occurrence of 

potential outliers in more than one coordinate at the same 
observation times. For tne path or vehicle B tnese 
occurred only b times and in only two were all tnree 
coordinate values incicatea as potential outliers. or 

the 43 potential outliers occurred in one coordinate only, 
une mignt be tempted to expect greater muitiplicy since any 
discrepancy in data r rom tne instrumentation arrays is 
transrormec to position coordinates anc nence would oe 
expected to contaminate the values or ail coordinates at 
that observational time. 



/ 



III. TREATMENT OE POTENTIAL OUTLIERS AND MISSING POINTS 
A. General 

The procedure used in this study (and proposed for data 
smoothing at NUWES incorporates a 7-point Least-Squares ( L-S ) 
polynomial computational routine to treat missing points and 
potential outliers and, subsequently, tor smoothing the rest 
of the data. since missing points and potential outliers 
can contaminate the smoothing of other data points, they 
should be treated first. 

The combination of seven consecutive points for the 
smoothing routine and the regular scheduling of missing 
points (every eighth point) complicates the treatment. 
Operation of chance would dictate that only one time out of 
eight would a potential outlier or another random missing 
point be centered in the seven point segment between 
successive scheduled missing points. A missing point or a 
potential outlier centered in a seven point segment with no 
other missing points or potential outliers will be called 
isolated . These are the easiest to treat. The presence ot 
two or more missing points and/or potential outliers in the 
same seven point data segment calls cor more careful 
treatment. as discussed in Reference 1, the presence of 
three such points in a segment should be flagged to indicate 
to potential users ot the smoothed data that the data is ot 
reduced quality. 

As discussed in Reference 1, isolated missing points 
or potential outliers are treated by iterating the 7-point 
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L-b program replacing the suspect value by the smoothed 
value at each step and repeating until the smoothed value 
has a residual error well within the noise ot the remaining 
values in the segment. Since the 'noise' in the NliwES 
system has a standard deviation ot 2 or less for good 
quality data, the value of 1 has been selected as the 
magnitude of the residual error for stopping the iteration. 

The treatment of multiple missing points and/or 
potential outliers involves the same procedure with 
the suspect values replaced by the smoothed values at each 
step and the smoothing continued until all of the suspect 
values have residual errors within the specified level (1). 

A few missing points and potential outliers in Trial 2 
are used to illustrate the smoothing procedure. These are 
presented in the next section. h 7-point L-b Polynomial 
program tor the TI59 hand-held calculator (see Ret. 1) was 
used in this treatment. 



9 



Treatment of Isolated Values 
1. An Isolated Missing Point 

The isolated missing value selected for 
illustration of the treatment occurred at times t=2118 
in the x coordinate of vehicle A . Data in the 
vicinity of this point are presented in Figure 2 and 
Table 2. Also presented in Table 2a are the sequential 
differences . 

Three iterations of the 7-point L-S polynomial 

smoothing were performed (see Table 2b, columns 2, 3, 

and 4). The first iteration showed a residual error 

of r (J = -3.76. Replacing the temporary value 

Xy = 33810.9 by the smoothed value X ^ = 83,814.7 

and performing the second iteration showed the residual 

error reduced to r^ = -1.23. Again, replacing by 

X = 33,815.9 (the smoothed value in the second stage) 

and iterating resulteo in the smoothed value 

X ^3 = 33,81o.3 with the residual error reduced to 

r, ,= 0.42. Since this residual error is less than one 

o 

in magnitude, the iteration was stopped. Note that 
tne smoothed value X^ has a residual error within 
the specified limits. The thira iteration was 



necessary to establish this. The residual error r, 
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will be even smaller. Since the third estimate X 
had to be determining the value of r^, i*- 1S 

accepted as the smoothed value tor x at time t = 2128. 
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figure 2 Missing Point < 2nx) t = 2128 
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Table 2d Iso la Led Missing To l n L (2 ax) 
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Examination of Figure 2 and the first order successive 
differences (Al) in Table 2a indicates that vehicle A was 
undergoing a change in path in the vicinity of t = 2128. 
Sequential differences were recalculated to determine if 
this change might be indicated by a potential outlier pos- 
sible at t = 212y or t = 2130. These values are also pre- 
sented in Table 2a. The fourth order sequential difference 
at t = 212y was increased in magnitude from 18.2 to 40.0 but 
does not exceed the threshold of 50 so the change in path 
was not detected by sequential differences. 

Because of the change in path (maneuver) of vehicle A, 
the effect of shifting the segment on the smoothed value was 
explored. Segments with centers at t = 2126, 2127, 2120, 
and 2130 were fitted. The smoothed values obtained are 
presented in Table 2c together with the residual errors 
g , at t = 2128 and the standard deviations (SDR) of the 
residual errors for the segments. The computations are 
presented in Tables 2b. 



Table 2c - Varying Segments for Smoothing 



Segment 

Center 



Smoothed x 
at t = 2128 



Res idual 
Error (r ) 



S td . Dev 
( SDR) 



2126 

2127 



2128 (M) 



2129 

2130 



33810.3 

33811.7 

33816.3 

33813 .8 
33817.5 



0.57 
-0.82 
-0.43 
- 0 . 88 
- 0.81 



1.57 
2 .06 
8.55 
3.26 
5.71 
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Table ^b. Smoothing Missing Point at 
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There are several features in these tables that are worthy 
of comment as follows: 

(a) All of the smoothing applications involved a third 
order polynomial since the standard deviations SDR 
of the residual errors was smaller for the cubic 
(SDR3) than for the linear (SDRl) or the quadratic 
( SDR2 ) . The cubic polynomical used to tit the 
data segment with center at t = 2l2o is of the 
form 

x 

I 

t 

since the coefficient (b ) of the third order term 
is positive (b > 0). Ror all other segment 
centers the cubic is of the form 

x 




since tne coefticient b is negative. These 
results suggest that tne data segments centered at 
t = 2127 to t = 213U included positions in the 
maneuver . 




(b) The smoothed values for x at time t = 2126 vary 
more than 7 units depending upon the data segment 
used for the smoothings. The question now arises 
of which smoothed value provides the best estimate 
of the x coordinate of vehicle A at time t = 2126. 
The residual error at t = 2126 provide no help 
here since it could be reduced to zero by 
repeated iteration. 

Note that, as discussed in the smoothing of 
the segment centered at t = 2126 , tne residual 
error is the difference between the temporary 
value before the last iteration and the smoothed 
value after that iteration and hence does not 
represent an error in the smoothed value. It 
should be noted also that turther iteration to 
reduce the residual error at t = 21 26 will only 
produce small reductions in the standard 
deviations or the residual errors or the segments 
since the purpose or the iterations is to reduce 
the residual error at that point to a value well 
within the residual errors at tne other points in 
tne same data segments (i.e., small contribution 
with respect to the 'noise' in the segments). 

(c) Ut greater use tor selecting the most appropriate 
data segment, and consequently, of the most 
appropriate estimate for x at t = 2126 , are the 
values or the standard deviation of the residual 
errors (sUR, in Table 2c). The standard 
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deviation of the residual errors is used in 



establishing confidence intervals for the actual 
value of the dependent variable (x) with smaller 
values producing narrower contidence intervals 
(Ref.l). The data segment centered at t = 2126 
had the smallest standard deviation and hence 
could be considered to give the preferred 
es t imate . 

The variation of the width of the confidence 
interval with the degree of polynomial used to fit 
the data segment and with the location of the 
missing data point within the segment has not been 
fully explored. The first degree polynomial was 
treated in Reference 1 but similar expressions 
for confidence intervals when second and third 
order polynomials are used needs further 
de ve lopment . 

(d) There should be some concern about tne etfect of 
the change in vehicular path on the smoothing of 
the data. This change occurred in the vicinity o 
times t = 21 2d or 212S. (See Fig. 2) 

Note, one possible explanation for the 
increase in the value of SDR as the center of the 
data segment is shifted is an increase in the 
'noise' level in the observations. Another is the 
inability of the third order polynomial to 
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rt, 



represent the actual vehicular path adequately. 

In order to avoid the latter possibility it woulc 
appear Desirable to avoid smoothing aata with 
segments inducing rapio changes in vehicular 
paths. Referring, again, to figure 2, it can be 
seen that a rather abrupt change in the vehicular 
path is apparent at time c = 213U but that the 
observation at time t = 2l2y appears to be 
consistent with the preceding values. Thus 
exclusion o£ the observation at t = 213U from the 
segment would lead to using tne 7-point data seg- 
ment centerea at t = 2126. Further, this same 
segment should also be usee tor subsequent 
smoothing of data values at times 2127 and 2129 
insteaa of using data segments centerea at those 
times. (Note that this suggestion of using the 
data segment centered at t = 2126 to smooth the 
value for the missing point at t = 2126 is in 
accord with the Discussion in comment c above.) 
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be as simple and short as possible. 

These two guidelines are contradictory when it 
comes to treatment of changes in vehicular oaths. 



IS 



It coula be ver^ awkward to construct subroutines 
to implement automatic ioent 1 1 icat ion of the times 
of changes in vehicular paths. un the other hand, 
manual screening of the data to identity such 
times would reduce the level of automation. 

Fortunately there is another source of infor- 
mation that could be made available to provide 
this information. This is the internal control 
information collected from the vehicles. It is 
stronyly recommenced that this source of informa- 
tion be explored with the intent of including it 
with the data to be smoothed. 

2. rtn Isolated Potential Outlier 

As discussed in Section IIIA, isolated potential 
outliers are rare. one occurrence in the trial used in 
this report was tne y coordinate ot vehicle .n. at time 
t = 22b«. The data in the vicinity ot tne potential 
outlier is presented in Table 3a together with the 
sequential difference (£4). At t = 22bd A4 = 75.1 
which exceeds the selected threshold magnituce of 5U 
anc hence the y value at t = 22b d is incicatec as a 
potential outlier. A plot ot the data is also 
presented in figure 3. 

Treatment ot a potential outlier is the same as 
that for an isolated missing point. four iterations 
were required to ensure that the smoothed value to be 
used as a replacement tor tne potential outlier -was 
consistent with tne other six values in tne data 
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b’igure 3 Isolated Potential Outlier 
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Table 3a 
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Treatment: observed value y = 7U4<J.cs a: t = 2263 replaced 

by smoothed value y = 7 Ujj.4 
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segment, i.e., that the residual error ot the smoothed 

value was less than the specified magnitude of one. 

The fourth iteration was required to determine whether 

the smoothed value obtained in the third iteration 

satisfied this criterion. As in the treatment of an 

isolated missing point (Section III bl ) , the smoothed 

value x, = 7033.4 established in the fourth iteration 
4 

was selected as the replacement for the observed value 
x ^ = 7048. a and will have a residual error about the 
fitted curve which is less than r^ = U.39. The itera- 
tions conducted on a TI59 are presented in Table 3b. 

There are several features of this treatment which 
are worthy ot comment. 

(a) sequential differences were recalculated after 

replacing the potential outlier. These are pre- 
sented in the right hand part ot Table 3a. The 
fourth order difference at t = 2268 has been re- 
duced in magnitude from 75.1 to 1.9 and elimina- 
tion of the contamination of the fourth order 
differences at the adjacent times has also reduced 
tneir magnitude. 

(b, It is of some interest to note that in the 

first two iterations a second order polynomial 
(parabola) provided the best fit (smallest SDR) 
but a tnird order polynomial (cubic) gave a 
slightly better fit in the last two iterations. 
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c) The reduction in the residual error in the 

potential outlier and in the standard deviation 
(SDR) of the residual errors of the data segment 
are worthy of notice. The residual error was 
r ( .=-l(J.39 for the potential outlier and the stan- 
dard deviation of the residual errors (the differ- 
ences between observational values and smoothed 
values) was SDR 2 = 6.95. The third iteration of 
smoothing replaced the potential outlier value of 
X = -7046,8 by x^ = -7033.8 which as a residual 
error of r^=-0.4 and the standard deviation of 
the residual errors was SDR3 = 2.70 (established 
in the fourth iteration). 



d) The magnitudes of all of the residual errors when a 
data segment is smoothed is of some concern. This 
is represented by the value of the SDR which was 
somewhat larger in all but one of the data 
segments examined in the previous subsection 
( Illal ) where an isolated missing point was 
considered. There is always some reservations in 
the mine of this investigator (and should be in 
the mind of any potential user of the smoothed 
data) whether a larger value of the SDK is caused 
oy inaceguacy of the model (polynomials of order 
three or lower) or an increase in the level of 
noise in the data. 



Inadequacy of the model is not limited to major changes 
in a vehicular path as apparently occurred in the missing point 
example but could be produced by lish-tailiny (snake action) for 
vehicular control or minor corrections in attack path. A higher 
data rate would improve the smoothing capabilities for following 
such higher frequency path variations by allowing use of longer 
path segments and/or higher order polynomials as well as improved 
smoothing capabilities even when such path anomalies were not 
present . 

The presence of an unscheduled missing point or of an 
outlier when the bDR tor the residual error is large should not 
be unexpected. It should serve as an indication that the 
position location system is having difficulty in obtaining 
consistent data on the vehicular path. 

The inability ot the smoothing procedure to distinguish 
between inadequacy ot model and noise as the cause tor larger 
values ot tne SDK should be recognized as a different kind of 
inaccuracy ot the model. In the development Of the Least-bquares 
dodel it was assumed that the noise components of the observed 
values were independent. Any persistence in the noise component 
is thus treated as a portion of the actual path component. as an 
extreme example, any constant portion ot the noise component that 
persists over an entire data segment will result in a oias in the 
smoothed path, i.e., in an offset of the smoothed path from the 
actual path ot the vehicle. 
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3. An Isolatea Missing Point/Potential outlier 



The fact that the temporary replacement ot an 
isolated missing point by the average of the values at 
the adjacent points can produce a value which is 
identified by the sequential differences as a potential 
outlier is illustrated by the x-coordinate of 
vehicle A at time t = 213b. The data segment and 
the sequential differences are presented in Table 4a 
and sketched in Figure 4. The four smoothing 
iterations are shown in Table 4b. 

The treatment here is not different from that of 
an isolatea missing point or a potential outlier. It 
is included in this report to illustrate that the 
temporary replcement of a missing point by the average 
ot the adjacent points is actually using a 2 -point 
straight line tit and hence may be substantially 
different r rom the actual value ot the component when 
the vehicle is not traveling in a straight line. 

One other side comment that may be of interest is 
the magnitudes of the bDRs in the second and third 
smoothing iterations in comparison with the actual 
values of tne residual errors. The bDk of the residual 
errors is larger than anj of the resiaual errors in 
these iterations. This is a consequence ot using 



bDK 



1 



L (r i -r) 2 j2 



1 




n-4 



r 

l 



instead oi: the root-nean-square 
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figure 4 ISOLATED MISSING POI NT/POTENTI AL OUTLIER 
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Table 4a 



ISOLATED MISSING POINT/POTENTIAL OUTLIER 2 AX 



t 


X 


before Treatment 
A 4 


After Treatment 
x*=33,4b2. 2 
A 4 


2131 


33 , 7y4. 2 






2 13 2 


33 , 72b . 1 


12 .3 




-- 




-- 




2133 


33,637.7 


-23.4 




— 


-- 


— 


-- 


2134 


33,550.5 


34.3 


2 (J . 1 


— 


-- 


— 


— 


213b 


3 3 , 4 8 6 . b 


-b8 . 2 


-31.6 


— 


— 


— 


-- 


2136M 


33,466 . 5 


lub . 7 


23.2 


— 


-- 


— 


-- 


2137 j 


1 

3 3 , 4 4 b . 3 


-32 . D 


-25.5 


i 




-- 


-- 


1 

2iJb 


33 , 4bb . 1 j 

j 


4.8 | 


-9 . 4 


-- 


1 


1 

i 


— 


2^39 


1 

3 3 , 5 o 9 . 3 j 


1 

-U . b 1 

1 




— 


— 


1 

“ 




2140 


33,650.2 

I 


-5.9 

i 




2141 | 


1 

! 

33,733.5 


i 

i 

i 





2S 



RMS 



i = [ - l r. 

n ^ i 






when a cubic polynomial is fitted to a data segment of 
n=7 observations. The unbiased estimator SDR or the 
standard deviation for the noise component is consider- 
ably increased over the RMS value because the data 
segment is so short. An increase in the data rate to 
increase n is desirable. Note that a ktn order 
polynomial would require a divisor of n-(k+l) since it 
would involve k+1 coef t lcients . Thus a substantial 
increase in n (e.g., doubling the data rate) would 
permit some increase in the order of polynomials con- 
sidered for fitting the data segment without making the 
value of SDR unrepresentative of the residual errors. 
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Table 4b 



Isolated Missing Pcint/Potent lal Outlier ( 2Ax ) 







Smoothing 


Iteration 






x . 
1 


X 1 

1(2) 


X ' 

i (2) 


X * 

i ( 4 ) 


2133 


33,637.7 








2134 


33,556.5 








2135 


33,486.6 








2136 M , W 


33,466.5 


33,456.8 


33,453,6 


33,452,5 


2137 


33,446.4 








213 8 


33,485.1 








2139 


33 , 5o9 . 3 








SUK 1 


64 .974 


b 6 .665 


67.266 


67.477 


S UK 2 


9.428 


7.599 


7.372 


7.346 


SDR 3 


7.645 


3.922 


3.294 


3 .210 


b 


3 


3 


3 


3 


°3 


U . 625 


0 . 925U 


0 .925 


0 .925 


D 2 


15.7179 


16.1798 


1 b . 3 5 2 1 


16.3845 


b l 


-21 .4143 


-21.4143 


-21,4143 


-21.4143 


% 


-02 .8714 


-64 .7190 


-65 .3286 


-6 d „ 5 38 1 


x ' 

x -3 


33,637.6 


33,638.5 


33,638.8 


33 ,638 . 9 


x ' 

- 2 


33,555.1 


33,553.8 


33,553.3 


33,553.1 


x ' 

X -1 


33,493.1 


33,490.3 


33,489 .4 


33,489.1 


x o 

x ; 


33,456.8 


33.453.6 


33,452.5 


33,452.2 


33,452.1 


33,449.3 


33,448.4 


33,448 . 1 


X 2 


35 , 484 . 3 


33,482.9 


33,482.4 


33,482.3 


x’ 

3 


53 , 539 . (J 


33 , d6U . 0 


35 , 560 . 3 


33,560.4 


V ' 

" - 3 


0.13 


— 0 . 8 U 


-1.10 


-1.20 


K ' 

" - 2 


1.36 


2.74 


3 . 20 


3.36 


r -i 


-6.45 


-3.68 


- 2.76 


-2.45 


v- 1 

" U 


9 . 6 o 


3.19 


1 .06 


0.32 


r 1 

l 


- 5.77 


- J . uu 


-z . uy 


-1.77 


r ' 

~ Z 


0 . 6 i 


2 . 2 U 


2 • 6 b 


2.81 


r 3 


0.26 


- U . 6b 


-u • y 6 


-1.07 
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Treatment of Multiple Values 
1. General Considerations 



C . 



When more than one missing point and/or potential 
outlier occur in the same 7-point data segment the 
selection of the appropriate treatment is more diffi- 
cult. Treatment of data segments containing 3 or more 
values which are either missing points or potential out- 
liers require additional cons iaerations and will be 
postponed until tne next section (Sect. D) . Only 
occurrences of two such values will be examined here. 

Treatment of two such suspect values must take 
into consideration the differences in the nature of 
suspect values as well as their location in a 7-point 
segment. There are three possible procedures: 
a) Smooth first one using iterations as necessary, 

then the other using the smoothed value for the first. 

It would appear advisable to resmooth the first again 
after the second is smoothed. The question arising here 
is wnich value should be smoothed first. In the case of 
two potential outliers it voulo appear reasonable to 
smooth first on tne one with the largest fourth order 
difference (A4) as representing the greater potential 
contami nator . In the case of a potential outlier and a 
missing point it vould appear reasonable to smooth the 
potential outlier first for the same reason. In the 
case of two missing points this reason is not pertinent 
and a reasonable procedure would be to smooth the one 
that occurred first in time for computational simplicity 
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(b) Alternate smoothing iterations centered on first one 
time then the other, continuing the iterations until the 
residual errors of both are within the prescribed limits. 
This procedure requires more computational effort since 
the 7-point segments shift between each iteration. There 
is also the possibility that, because different data 
segments are involved, both residual errors cannot be 
reduced to tne prescribed level simultaneously. 

(c) Simultaneous smoothing of the two values using a 
single 7-point segment. The question here is where the 
segment should be centered. This selection should take 
into consideration the quality of the resulting smoothed 
values . 

As discussed in Reference 1, the quality of a 
smoothed value can be expressed in terms of the width of 
the confidence interval tor the actual value at any time. 
This confidence interval is of the form 



Cl . ( x ) 

1-ct t 



[ X ( t ) 




( t-tr 




2 



] 



r T2 

= UU) — c a/2 z r \J j * 75 1 (1) 

when the t values are translated to t'= -3, -2, -1, 0, 1, 

2, 3 for the 7-point segment .vhen the fitting polynomial 
is linear. (The comparable forms tor quadratic and cubic 
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polynomials has not been explored.) Thus the values to 
be smoothed should be as close to the center of tne 
segment as possible since the confidence interval will pe 
shortest when t' = 0. 

Situations in which adjacent points are both 
potential outliers are unlikely occur with the 
identification procedure specified (Ref. 3) since only 
the point having the largest fourth order difference (A4) 
exceed-ing the prescribed level (50) has been so labeled. 
(To guard against outliers close to each other, 
sequential differences should be recalculated whenever a 
potential outlier has been smoothed.) This is 
illustrated in Section III C 2. 

Situations in which adjacent points consist of a 
potential outlier and a missing point should oe treated 
simultaneously using the data segment centered on the 
potential outlier since it contaminates the temporary 
value assigned for the missing point. This is illus- 
trated in Section III C 3. 

For situations with adjacent missing points, sim- 
ultaneous smoothing is again recommended. It is, 



however , 


ambiguous 


as to 


wh ich 


one s 


hould be used as 


t 


center of 


the data 


segmen 


t used 


for 


smoothing. This 


is 


examined 


in Section 


III C 


4 for 


one 


such occurrence 


i n 



the trial run. 
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When a missing point occurs in the 7-point data segment 
centered at a potential outlier but is not adjacent to 
it, there is some question as to wnether it should be 
smoothed simultaneously or subsequent to the treatment 
ot the potential outlier. This has not been examined 
but, on the principle of making the associated confidence 
interval as short as possible, the latter would appear 
preferable . 

For two missing points in the same 7-point data 
segment which are not adjacent, the treatment can be 
different depending on their separation. If they are 
separated by only one point the possibility of simultan- 
eous smoothing using that point as the center of the data 
segment would be advantageous from a computational 
viewpoint and would not cause a substantial increase in 
the width of the confidence interval. This can be seen 
in the factor 



_L 

n 



+ 




1 1 
7 T 2a 



for t = t 1 in Equation (1 
examined in Section III C 5. 
missing points are separated 
explored in Section III C 5. 



. This situation is 
The situation when two 
by two other points is also 
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III. C 2 Two Potential Outliers 

As discussed in Reference 3, a larye fourtn order 
sequential difference (A4) indicating a potential outlier is 
typically accompanied by large A4's for the adjacent values but 
with opposite signs. These may also exceed the specified 
threshold but should not initially be labeled as potential 
outliers. This is illustrated in the Table ba and figure b. 

Note that the A4's at times 2212, 2 z1j>, and 2214 all exceed the 
threshold bU, that their signs alternate, and that the magnitude 
of A 4 at 2Z 1 3 is largest. Only the value of z at t = 2213 
should be considered a potential outlier. smoothing this value 
(Table 5b) and recalculating tne A4's verities that the values 
at t = 2212 and 2214 are not potential outliers and that their 
A 4 1 s were contaminated by the designated outlier at t = 2213. 

It the second calculation indicates another potential 
outlier in the vicinity of the first one, tnen the suggested 
procedure tor smoothing can depend on tneir separation. The data 
from Trial Run 42 was not examined to see whether this occurred. 
Treatment for such a situation is the same as that tor two missing 
points and will be presented in Section III C 3. 
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Table 5a 



Potential outlier at t = 2213 ( 2tsZ ) 



t . 
1 


t . 
1 


z . 
1 


A 4 


A 4 * (_z = 402. 3J 


-4 


2 20 9 M 


-3y 3 . 3 






-3 


2 2 1 U 


-379.1 


42.3 




-2 


2211 


-386.5 


5.5 


0 

i — 1 

1 


-1 


2212 


-3y4 .8 


-98 . 2 


-2.2 


U 


2213 W 


-377 . 8 


144.9 


-2.1 


1 


2214 


-407.5 


oc 

00 

00 


-9.2 


2 


2214 


-411.0 


-2.6 


-27.1 


3 


2216 


-404 . 2 


26 . 7 




4 


2217 


-405.6 







- ,7 



j 7 0 



-iau i - 



-390 



-4UU - 



- 4 1 0 



I I 



-1 






figure 5 Potential outlier at t = 221 i ( 2tsZ ) 
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Table 5b Smoothing Potential Outlier in ztsZ at t 



ZZl3 



t 


t 1 


Z . 

l 


Z i ( 2 ) 


Z i(3) 


2 i ( 4 ) 


2 2 1 U 


-3 


-379 . 1 








2211 


-2 


-3865 . 5 








2212 


-1 


-394.8 








2213W 


0 


-377 . 8 


-394 . 4 


-400.0 


-401.9 


2214 


+ 1 


-407 . 5 








221b 


+ 2 


-411 .0 








2216 


+ 3 


-404 . 2 








S L)Rl 




9.435 


5.093 


b . 096 


3.332 


SDR2 




10.548 


4.320 


2.854 


2.636 


SUR3 




11.842 


4.093 


1.651 


1 . Ob 5 


R 




i 


3 


3 


2 


b i 




-- 


. 3 3 b 1 1 


. 3 3 b 1 1 


. 3 3 b 1 i 


b 2 






. 809 5z 


1 . 0 7 b 2 


1 . Ibb7 


b i 




-4 . 8929 


-7 . 2430 


- 7 . 245b 


- 7 , 2 4 b b 


■>„ 




-394 . 4 1 


-3.2381 


-4.3048 


- 4 . b 6 b 7 


^3 




-379.7 


-380 . 1 


-379 . 5 


-379.4 


Z -2 




-384 . 6 


-385 . 0 


-385 . 8 


-386.1 


Z -1 




-389 . 5 


-392.3 


-393.9 


-394 . 4 


7 1 
7 1 

Si 




-394.4 


-400.0 


-40 1 . 9 


-402. 5 




-399 . 3 


-406 . 1 


-4U7 .7 


-408 . 3 


Z 1 

+ 2 




-404.2 


-408 . o 


-409 . 4 


-409 . 7 






-409 . 1 


-405 . 4 


-404.9 


-404 . 7 


r -3 




. 64 


. 98 


.44 


. 2 6 


r -2 




1 

1 — ’ 

CG 


-1.52 


-.72 


-.45 


r _i 




-5.28 


-2.50 


- . 90 


- .35 


y~ 

"u 




1 6 . 0 1 


5 . b 2 


1.89 


. b 2 


r i 




-a. 19 


-1 . 38 


. 22 


. 77 






-6.6 


-2.41 


-1 .61 


1 . 34 


V 

" 3 




4 . 89 


1 . 20 


. 67 


. 4 9 



III. C 3 A Potential Outlier and a Missing Point 

when a missing point is adjacent to a potential outlier 
its temporary value is the average of the potential outlier and 
the neighboring value on its other side. It would appear 
reasonable for this situation to smooth the two values 
simultaneously using the data segment centered on the potential 
outlier. An example of this occurred at times 21'75(W) and 2176(M) 
in the x coordinate of vehicle A in Trial 2. The 
appropriate data is presented in Table 6a and Figure 6. The 
TI 5y calculator output is shown in Table 6b. Sequential 
differences were recalculated since a potential outlier was 
present and the A4*'s are also presented in Table 6a. 

It a missing point occurs in the data segment centered 
at a potential outlier but is not adjacent to it, then 
simultaneous smoothing may not be appropriate. Note that the 
factor 



1 

7 




in determining the width of the confidence interval is 



1 


4 _ 2 




1 


9 


13 


— + 

/ 


2b 7 


for 


t = ± 2 and — + 


28 


7d ^r t = ± 


tne 


width of 


the 


confidence interval 


at 


these times is 



substantially increased. It would appear reasonable in such 
situations to smooth the potential outlier first, then smooth the 
missing point, and, if desired, to resmooth tne potential outlier. 
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Table ba Adjacent Potential outlier and Missing Point 



t 1 


t 


X . 
1 


A 4 


( Xu = 34^ $63.4 ) 
(Xi=34 ,477 . 1 ) 


-3 


2172 


34,341.7 


-30.8 




-2 


2173 


34,395.5 


17.7 


-22.0 


-1 


2174 


34,436.6 


-36.1 


2.8 


0 


217 5 w 


34,472.3 


63.1 


-3.7 


+ 1 


2 1 7 b M 


34 , 4 7 J . a 


-45.6 


10 . 1 


+ 2 


2177 


34 ,475.2 


1.8 


-20.5 


+ 3 


2178 


34, 46b. 3 


21.4 


24.7 


+ 4 


2l7y 


34 , 4 34 . 5 


-9.6 

i 


. 



34,475 



34 , 45U 



34,455 



34 , 4UL) 



b4 , 3 7 5 - 



a 



© 



34 , 350 



34 ,325 



1 I I 



I I I I 1 



-4 -3 



-101234 



figure 6 Adjacent Potential outlier and Missing Point 



Table 6b Smoothing Adjacent Potential outlier and Missing Point 



t 


t 1 


X . 

l 


X i ( 2) 


X i ( 3 ) 


2172 


-3 


34,341.7 






2 1 7 J 


-2 


34,395.6 






2174 


-i 


34 ,43b .b 






217 5W 


0 


34,472.3 


34,465.1 


34,464.0 


2 1 7 6M 


rl 


34,473.8 


34,478.4 


34,477.6 


2177 


+ 2 


34,475.2 






2176 


f 3 


34,465.3 






2179 


■+* 4 


34,434.5 






SDR1 




28.868 


27.867 


27.525 


SUR2 




4.665 


1 . 546 


1 . 289 


5 DR 3 




5.150 


1 .715 


1 .323 


R 




2 


3 


3 


b a 










d 2 




-b.9690 


-6 . 7905 


-6.7095 


h 




20 . 2643 


20.4286 


20.4 






27.8762 


27 .1619 


26.8381 


X ' 

- J 




34,341.6 


34,341.6 


34,341.8 


*:•> 




34,396.7 


34,395.4 


34 ,395.8 






54,437.8 


34 ,436.8 


34,436.3 


x o 

x :i 




34 . 4b5 . 1 


34,464.0 


3 4 , 4 6 3 . 4 




34 ,478 . 4 


34,477.6 


34,477.1 










A + 2 




34,477.7 


54,477.7 


34,477.4 


X' 

r J 




54 ,463 . 1 


J4,464.2 


34 , 464 . 2 


r -j 




0 . 1 4 


0.11 


-0.11 


r -2 




-1.17 


-0.47 


-0.26 


r -l 




-1.24 


-0.17 


0.31 


r o 




7.22 


1.11 


0 . 60 


r i 




-4.57 


0.77 


0.51 


V 

“ 2 




-2.53 


-2.49 


- 2 . 1 o 


r 3 




2.15 


1 . 14 


1 . 09 



4U 



Ill C 4 Two Missing Points 

Some exploration will be presented here of the effects 
of different treatments of two suspect values when they are 
adjacent, separated by a single value, or separated by two values. 
First, consider a situation with two adjacent missing points with 
no other suspect values in the 7-point data segment centered on 
either of them. It would appear reasonable to use simultaneous 
smoothing using the data segment centered on either one. An 
example of this in Trial 2 data is presented in Table 7a and 
Figure 7. The outputs of the TI 5y calculator smoothing are shown 
in Table 7 using the data segment centered at t= 2352. This 
example is of some interest since the fitted cubic polynomial 
changes drastically with the shift of one unit in the aata segment 
location. This is indicated by tne coefficient b^ of t^ which 
is positive when the segment is centered at t = 2352 and is 
negative when the segment is centered at t = 2353 (See section III 
o 1 ) . In spite of this difference in the fitting cubic 
polynomials, the smoothed values do not differ drastically from 
each other or from the temporary values initially used. Whether 
the differences in the smoothed values shown in Table 7 are of 
concern to potential users of the smoothed data is uncertain. If 
it is not, then simultaneous smoothing can be used with either 
missing point at the center of the data segment. 

Smoothing of these points using simultaneous smoothing but 
alternating the center between the missing points at successive 
iterations has not been explored since only one smoothing step 
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figure 7. 



Adjacent Missing Points. 
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Table 7 Smoothing Adjacent Missing Points 



t 


t ' 

l 


x (1) 

1 


t ' 

l 


x (2) 

1 


2349 


-3 


33,015.3 






2350 


-2 


33,021.1 


-3 




2351 


-1 


33 ,022.0 


-2 




2 3 5 2M 


0 


33,024.0 


-1 




2353M 


+ 1 


33 ,026 . 1 


0 




2354 


+ 2 


33,026.1 


1 | 




2 3 5 5 


+ 3 


33,033.0 


2 




2356 




33 ,027 . 0 


3 




SDR1 




1 .334 




2.443 


SDR2 




1 .490 




2.499 


SDR3 




.736 




1 .904 


R 




3 




3 


b 3 




0. 1833 




-0.2556 


0 




0.0143 




-0.1405 


»1 




1 .2595 




3.3432 






-0.0571 




0.9819 


X -3 




33 ,015 .6 




33,021 .6 


X' 

-2 




33,020.2 




33 , 021 . 2 


X ll 




33 ,022.7 




33,023 . 5 


X U 

X R 




33,024.2 




3 3 , 0 z b • 9 




33,025.6 




33 , 029 . 7 I 


x ;i 




33 , 028 . 2 




33 ,030 . 6 


x ; 3 




3 3 , 0 3 3 . 0 




33,027.9 


r - 3 




-0 . 27 


r~ 




V 

- O 




0.86 






r -l 




-0.74 






r o 




-0.17 






r l 




0.47 






r 2 




-0.11 






r 3 




-0.03 


! 


1 



( 1 ’ 


Segment 


center 


at t 


- 23 5 2 


( 2) 


segment 


center 


a t t 


_n 

I! 
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using either center brings residual errors for both replacement 
values within the desired level ( | r ^ | < 1) . Such alternation of 

data segment centers could require substantial computational 
effort using the TI59 calculator and some increase in the program 
and computational effort on a large computer. 
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III. C 5 Two Missing Points Separated by a Single Point 

When two missing points are separated by a single point, 
the obvious choices are between either smoothing first one missing 
point using the data segment centered on it and then the other 
missing point using the data segment centered on it and using the 
smoothed value for the other missing point, or smoothing both 
values simultaneously using the data segment centered on the point 
between them. This situation is illustrated in Table 3 and 
Figure 3. The results of smoothing first for the missing point at 
t = 2111 and, subsequently, tor the missing point at t = 2113 are 
shown in Table 3. The results of smoothing first at t = 2113 ana 
then at t = 2111 are also shown in Table 3. Smoothing both mis- 
sing points simultaneously produced the results shown in Table 3 
(last two columns). The results are summarized below. 



Smoothing ' 


Smoothed 


Values 


Procedure 


t = 2111 


t = 2113 


1 

Temp. Values 


33 , 339.8 


I 33,470.4 


First at 2111 


33 , 389 . 5 


| 33,468.7 

1 


First at 2113 


33 ,389.2 


j 33,870.5 


Simultaneous 1 


33,887.9 


| 33,870.3 



Mote that the greatest difference between the smoocneo values is 
less than 2 units. If this difference is not considered to be 
serious then the simpler procedure of simultaneous smoothing could 
be preferred. 
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figure 3. Separatee Missing Points (2BX) 
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Table u. Missing Points at t = 2111, 2113 ( 2BX ) 
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There is some concern about this procedure, however. 



because of the large residual errors at t = 2112 and t = 2114. 

This concern is also supported by the large values of the SDK's 
(the Standard Deviations of the Residual Errors. Un examination 
of figure 8 it can be seen that there are two possible explana- 
tions of the large values of the SDK's. The first is that the 
actual vehicle track is inadequately represented by a cubic poly- 
nomial (Model error). The other is that the noise level in this 
path segment is greater than normal. The decision as to which ex- 
planation is correct cannot be determined from the data. Hope- 
fully, vehicular control information and maneuver capabilities 
will be of use here. 

Smoothing the value at t = 2112 should not be performed 
until after smoothed values have been established for the missing 
points so that its observed value is included in establishing 
their smoothed values. only then should the value at t = 2112 be 
smoothed using the smoothed values for the missing points. 



III. C 6 Missing Points Separated by Two Points 

When missing points are separated by two observed 
values, simultaneous smoothing appears questionable since, what- 
ever data segment is used, one of the missing points will not be 
adjacent to the center of the data segment. The preferred pro- 
cedure would appear to be to smooth one of the missing points 
first using the data segment centered at that missing point, then 
do the same for the other point. If, when the second missing 
point is smoothed, the residual error of the smoothed value for 
the first missing point is large (arbitrarily, greater than unity) 
it would seem reasonable to resmooth the first again. 

An example where two missing points are separated by two 
observed values occurs in the data for the second vehicle (2BX) 
wnere there are missing points at t = 2073 and t = 2076 . The data 
and graph are presented in Table y and Figure b>. Smoothing first 
for the missing point at t = 2073 produced the results shown in 
Table y. Since the residual error at t = 2076 is less than 
unity, the temporary value at t = 2076 was not subsequently 
smoothed using the smoothed value at t = 2073 as suggested above. 
Instead, the value at t = 2076 was smoothed using the temporary 
value for the missing point at t = 2073. Again, both residual 
errors were within the specified limit of unity. It would appear 
that the suggested procedure of smoothing first one missing point 
and then the other including the smoothed value for the first is 
not always necessary. That it was not in this example is no 
guarantee, however, that it may not be desirable in other cases. 
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figure 9. Separated Missing Points 
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Table 9. Missing Points Separated by Two Points (2BX) 



t . 


x . 

l 


Segment Center 


1 


2073 


2076 


2074 


2075 


2U70 


34 ,000 .6 










20 71 


33,997.3 










2072 


33,909.0 










20 7 3 M 


33,991.0 










2074 


33 , 99 2. 1 










2075 


33 , 906 . 3 










20 7 6 M 


33,906.6 










2077 


33,986.3 










2070 


33 , 904 . 0 








i 


2079 


33,970.7 




i 




! 


SDKl 




2.591 1 


1.056 


2.426 1 


1.759 


SDK2 




2.517 


1.601 I 


2 .470 


1.914 


SDR3 




2.624 


1.056 1 


2.795 


1.912 


K 




2 I 


2 I 
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1 ! 






— 
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3 












b 2 




0.3131 


-0.2655 


-- 


— 


b l 




-2.2036 


-0.975 


-17524 


— 1 T 0 3 2 1 


b u 




-1.2524 


-1.0619 


33,990.0 


33,900.2 


X - 3 




34,000 . 1 


33,992.0 


33,994.6 


33,991.3 


1 

X - 2 




33 , 996 . 4 


33,909.7 


33 ,993.0 


33,990.3 












1 

X -1 




3 3 , 99 3 . 2 


33,907.9 


33,991.5 


33,909 . 2 


1 

x o 

1 

X, 




33,990.7 


33,906.7 


3 3 , 9 9 0 . 0 


33 , 9a0 . 2 ; 




33,900.0 


33 , 906 . 0 


33,900.5 


53 , 907 . 2 


1 

1 

X 2 

1 

X 3 




33,907.6 


33,905.0 


35 ,906 .9 


33,906.1 




3 3 , 9 0 6 . 9 


33,906.1 


33 ,905.4 


3 3 , 9 0 5 . 1 


r 




0.47 


-1.00 


2.75 


-1.50 


- 3 
















0 .94 


2.41 


-5.23 


0.74 


" - 2 










r , 




-3.42 


-1.62 


-0.51 


2.0 7 


-1 












V* 

*- 0 




0 . 30 


-0.00 


2 . 11 


-1.9 | 


r i 




3.29 


0.83 


-2.16 


0.57 


V 




-1.25 


-0 .80 


-0.39 


0.66 | 


" 2 
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r n 

•s 




-0.31 


0.45 


j 1.38 


-0 . 30 



As a further exploration of this example, simultaneous 



smoothing for the missing values at t = 2073 and t = 207 
formed using data segments centered at t = 2074, and at 
The results are shown in Tables 9 also. In this example 
segments centered at any one of the four points appears 
acceptable for establishing smoothed values for the miss 
Subsequent smoothing should, however, still be performed 
values at t = 2074 and 2075. 

Although the SDR's are reasonably small for al 
of data segments, it is of some interest to compare the 
Figure 9 with the one in the previous section (Figure 0) 
scales on the y-axis are different but there appears to 
element of doubt about the actual path here also. Note 
smoothing procedure used second order polynomials to fit 
ments centered at t = 2073 and 2076 but used first-order 
mials to fit the data segments centered at t = 2074 and 



6 was per- 
t = 2075. 

, data 
to 

ing points, 
for the 

1 choices 
graph in 
. The 
be some 
that the 
the seg- 
polyno- 
2075. 
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Ill D . TREATMENT OF MORE THAN TWO MISSING POINTS AND/OR POTENTIAL 
OUTLIERS . 

I. General Discussion 

The presence of more than three questionable values, either 
missing points or potential outliers, in a 7-point data segment 
cannot be smoothed to establish estimated values by a cubic equa- 
tion. When there are three questionable values, they can be 
treated by either (1) iterated simultaneous smoothing or (2) es- 
tablishing the cubic equation that fits the remaining four points 
in the segment exactly and then using that cubic equation be deter- 
mine values for the three questionable points. When the same 7- 
point segment is used, the smoothing treatment (1) should converge 
to the exact fit (2). An example of this situation is explored in 
Section IIID2. 

Similarly, if there are four questionable values in a given 
7-point data segment the remaining three ooservations can be fitted 
exactly by a second order polynomial (parabola) or iterated simul- 
taneous smoothing can be used to fit the parabola. Also, if there 
are five questionable values, tne remaining two observations can 
be used to fit a first-order polynomial (a straight line) to these 
observations by either method. 

It should be noted that the critical numDer of observations in 
a 7-point data segment required for fitting a polynomial of order 
k is k+1 since there are k+1 coefficients in the polynomial. If 
there are less than k-<-l observations available then the polynomial 
cannot be established uniquely. If there are k-1 observations, it 
can either be fitted exactly or approximated by simultaneous 

5 a 



iterated smoothing. It there are more than k+1 observations then 
only smoothing is appropriate. 

t h 

It is also important to note that when a k order polynomial 
is fitted exactly to k+1 observations the standard deviation of the 
residual errors (SDR) is zero. In essence, the noise component is 
absorbed in the fitted polynomial and no estimate of the magnitude 
of the noise is possible. This absorption of the noise component 
into the target path is in contrast to situations (Section IIIB3 
for example) where polynomials of order three or less provide in- 
adequate representations of the vehicles path and hence part of the 
path variations are treated as noise. This results on larger 
standard deviations of the residual errors (SDR). It is worthy of 
emphasis, again, that a large value for SDR could be caused by 
either a large noise component or inadequacy of the polynomial 
model to represent the actual target path. It is important to 
determine which cause is pertinent. Potential sources for this 
information are internal control data for vehicular maneuvers, and 
examination of plots of the vehicle path. The latter would oe dif- 
ficult to incorporate into a data smoothing algorithm for automatic 
data processing (some human interaction may be necessary.) The use 
of internal control data appears to be a better approach of the 
goal of complete automation is to be achieved. 
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Ill D 2. 



THREE QUESTIONABLE VALUES 



The problem with three questionable values in a 7-point data 
segment will be illustrated by the z-component of vehicle A 
where there are potential outliers at t = 2125 and t = 2127, and 
a missing value at t = 2128. A plot of values of the z^'s in a 
region containing possible 7-point data segments is shown in Figure 
10 and listed in Table 10a. The fourth-order differences (A4) are 
listed in the third column. 

Selection of the appropriate 7-point data segment is the first 
consideration. Centering it at t = 2125 would place the missing 
value at t = 2128 at the end of the segment and would noc appear as 
desirable as centering it at t = 2126 or at t = 2127 to include the 
value on the other side of the missing point in the segment. 
Initially, it was decided to center the segment on the time between 
the potential outliers (c = 2126) so both potential outliers would 
be adjacent to the segment center and the missing point would not 
be an end point. 

The 7-point L-S Polynomial Smoothing program was used to per- 
form simultaneous iterative smoothing of the three questionable 
values with the results shown in Table 10a ana che fourtn column in 
Table lUa. Eight iterations were required to bring the residual 
errors of all three values within the prescribed level (r^ < l.U). 
Fourth-order sequential differences were recalculated and are shown 
in column 5 of Table lUa. Note that no potential outliers are now 
indicated although the value of A4, at t = 2124 is ulose to the 



selected threshold of 50. 
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Figure 10, Three Questionable Values ( 2A Z ) 



56 



Table lUa. 2A'Z t - 212b W, 2127 W, 212BM 
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Tdbie lUb. SMOOTHING 212b W, 2L27 W, 2126 M (Center at t = 2126) 




The four observations in this segment that were not considered 



questionable were next fitted by a cubic polynomial. 

z (t?) = by + b^t' + b^t' 2 + D^t' 3 



where 



t . 
1 


2123 2124 2123 2126 


2121 


2123 


2129 




-3 -2 -1 0 


+ 1 


+ 2 


+ 3 



so that 



t ! 
l 


1 

UJ 

1 

tv 


u 


+ 3 


2 ■ 
1 


-426.3 -423.3 


-465.3 


-557.7 
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The derivation of cubic equation fitting these four points exactly 
gives (Table 10c) 

z * ( t 1 ) = -465.3 - 26 . 4 6t 1 - 2.96667t' 2 + 0 . 5U6667t ' J . 
Estimates to the values at the times of the questionable values 

(t 1 = -1, +1, +2) were established using this equation and are 

presented in column 6 of Table 10a. Sequential differences were 

recalculated and fourth order differences presented in column 7 of 

Table 10a. 

Comparison of the values in columns 2, 4, and 6 indicate the 
following : 

(a) The observed values at t = 2125 and t = 2127 are incon- 
sistent with the rest of the observations (at t' = 2125, 

z. - z. = 17.3 and z. - z * = 17.9, and at t 1 = 2127, z . - z . = 19.9 

li li li 

ana z^ - z* = 24.4) so that both potential outliers should be 
reclassified as actual outliers. 

(b) The smoothed values, z(t'), are fairly close to the 

estimates z * ( t ' ) after eight iterations. More iterations should 

bring them still closer but the iterations were stopped when the 
residual errors were reduced to less than unity at all three 
suspect t imes . 

The fourth orcer differences in column 7 of Table 10a indicate 
tnat there is a new potential outlier at time t = 2129. On refer- 
ence to the graph (Fig. 10), it appears tnat the observation at 
this time is not necessarily an outlier but that there is a change 
in the path of the vehicle which cannot be adequately approximated 
by a cubic polynomial beyond this point. 
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Table lUc 



Exact Solution: 



Segment center at t 



2 1 2 b 



t 


t ' 


z . 


* 

z . 






1 


1 


2123 


-3 


-426.3 


-426 . 3 


2124 


-2 


-423.3 


-428.3 


2125 W 


-1 


— 


-442.3 


2126 


0 


— 


-465.3 


2127 W 


+ 1 


— 


-494 . 2 


2128 M 


+ 2 


— 


-526.0 


2129 


+ 3 


-557.7 


-557.7 



it Cubic 


z*(t) = b (J 




V 


+ V 


' 2 + b 3 t 


,3 f = -3,-2, 0, 


( 1) 


z * ( -3 ) = b y 




3bi 


+ 9b^ 


- 27b ^ - 


-426 . 3 


( 2) 


z * ( - 2 ) = b y 


- 


2b 1 


+ 4D 2 


- bb 3 = 


-428 . 3 


( 3) 


z * ( u ) = b Q 


= 


-46 5 


.3 




b o = 4b '- 3 


(4) 


z * ( +3 ) = o (j 




Jb i 


+ 9b 


+ 27b ^ = 


-557.7 



Substitute b ( 3 ) in (1), (2), (4) 



- 3 



( 1' 


) 


D 1 - 


3b ? + 9b 3 = -13.0 


(2' 


) 


°1 ‘ 


2b. ; + 4b 3 = -18.5 


( 4’ 


) 


b l + 


3b 2 + 9b 3 = 030.8 






Solve ( 1 ' ) 


for b, 

JL 


( 1" 


) 


D 1 = 


-13.0 + 3b 2 - 9b 3 



Substitute b^(l") in (2'j, (4') 

( 2 " > b^ -5b^ = -5.3 

(4") b v = -2.96667 b 2 = -2.96667 



Substitute o^ (4") in (2"j 

( 2 " ' ) = U . 5066b7 b 3 = 0.506667 

substitute b,(4") and b,(2"') in (1"; 

2 -> _ 
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As an exploratory exercise, the exact solution using the data 



segment centered at t = 2127 was also established. The results 
presented in columns 3 and 9 in Table 10a. It is interesting 
note that the observed value at t = 2129 does not appear as a 
potential outlier in the recalculated fourth order sequential 
differences. Neither does the observation at t = 2130. Since 
subsequent fourth order differences are not affected, there is 
potential outlier remaining when this data segment is used. 



are 

to 



no 
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IV. Conclusions and Recommendations 



Questionable data values, either potential outliers or tempo- 
rary values tor missing points, degrade the quality ot smoothed es- 
timates ot points on a vehicular path. A position location system 
which omits observations at every eighth observational time 
(scheduled missing points) makes the treatment ot other ques- 
tionable values more difficult and, if the latter are frequent, can 
even preclude the use of smoothing. 

Although potential outliers are treated the same way as miss- 
ing points in smoothing, a specific data segment, they can produce 
greater contamination of the smoothing process and should be given 
priority in any smoothing algorithm. Also, on replacement of a 
potential outlier by a smoothed value, sequential differences 
should be recalculated to determine whether other potential out- 
liers occur in its vicinity. It is important, wherever possible, 
to establish whether a potential outlier is actually an outlier 
(a wild observational value) or is an indicator ot a change in a 
vehicular patn that cannot be adequately represented by a polyno- 
mial of order three or less. Automation ot this iaen t i f icat ion of 
the cause for a potential outlier may be facilitated by other 
sources of information on changes in vehicular paths such as in- 
ternal control data. An alternative source of tnis information 
is manual observation of a plot of the observed data points to 
establish points at which the vehicular path has changed so that 
ic cannot be expected to be represented by a polynomial of order 
three or less. (The latter reduces t.ne extent to which automation 



oi 



can be achieved and hence the incorporation of internal control 
data into the smoothing process is preferred.) 

Isolated questionable values cause little problem since they 
can be treated simply by iterated smoothing to establish replace- 
ment estimated values consistent with the other observations in 
the 7-point data segment centered at the time of the questionable 
value. The presence of more than one questionable value requires 
more complex treatment. occurrence of two or three such values 
require oirferent treatments and was discussed separately. If 
more than three questionable values occur in a 7-point data segment 
the 7-poinc least squares smoothing procedure is not applicable. 
(Polynomials of order one or two could still be considered depend- 
ing on the number of questionable values but should be avoided 
since their ability to represent actual vehicular paths is ques- 
tionable.) buch data segments should be identified for both 
potential users of the smoothed data and data collectors. 

When there are two questionable values close to each other 
both nature, missing point or potential outlier, and their time 
separation need to be considered in establishing the appropriate 
treatment. The following cases and their treatments appear 
reasonaole : 

a. Adjacent questionable values. 

(1) If the two questionable values consist of a potential 
outlier and a missing point, then the two should be 
smoothed simultaneously using the data segment 
centered at the time of the potential outlier, 
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(Z) It two adjacent questionable values are botn missiny 
points, then they should also be smootneo simulta- 
neously using the data segment centered at the time 
ot one of them. (The choice of center may affect the 
resulting smoothed values but no general rule tor 
preference can be given.) 

(3) bituations in which adjacent questionable values are 
both potential outliers appears to be unlikely so it 
is not considered.) 

Two questionable values separated by a single observation. 

for the reason of simplicity of the smoothing 
algorithm and reduction in computation the two values 
should be smoothed simultaneously using the data segment 
centered at the observation time between the two 
questionable values. 

Two questionable values separated by more than one 
observation . 

bince at least one ot the questionable values cannot 
be adjacent to the 7-point segment center, it would be 
reasonable to smooth first one, then the otner, returning 
to the first tor resmootning. Priority of smoothing is 
for potential outliers and, it both are potential out- 
liers, the first smoothed should be the one with the 
largest fourth order sequential difrerence. 

bituations involving tnree questionable values could 
oe smoothed simultaneously using a data segment centered s 
that all three are as close to the center ot tne segment 



as possible. A substantial number of iterations may be 
required to bring the three residual errors to within the 
specified level. It would appear preferrable here to omit 
smoothing and to fit the remaining four points in the data 
segment using simultaneous linear equations to determine 
the coefficients of the cubic equation to fit these four 
points exactly. (It would be possible to use smoothing 
limiting the polynomial to order two or less but, again, 
the question of adequate representation of the target 
path arises.) Whether simultaneous smoothing or the exact 
tit is used, the procedure, in essence, treats the noise 
components of the four observations as part of the vehi- 
cular path instead of noise. Thus a reduction in the 
quality of the estimates is introduced and this informa- 
tion should be indicated to both potential users and data 
collectors . 

The material presented in this report has emphasized 
details which should oe useful in understanding the 
smoothing process and in implementing an appropriate 
program for smoothing 3-D data at NUWES . It also provides 
essential background for an investigation of the quality 
of 3-D data and for the establishment of Figures of Merit 
tor 3-D data submitted for smoothing which is to follow. 
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